Improve performance and reduce latency of Transport.write by attempting to send data immediately if all write buffers are empty#619
Conversation
|
This also increases aiohttp websocket echo performance by 20-25% |
be094bc to
96dc94c
Compare
|
Hi @fantix, I saw you recently commited something to uvloop. Do you know if someone could take a look at this PR and maybe give a feedback? |
|
Yeah I'll go through all issues/PRs again and include some in the final 0.21 release, it's just all taking some time unfortunately, thanks for the PR and your kind patience! |
…is removes unnecessary syscall
Hi @fantix, sorry to bother, I know it is open source, voluntary work and you're very busy. But would you have some time to look at this (and other my PRs) any time soon? Also, 0.21.0 is already released :) |
|
Can someone from the contributors have a look at this PR? It would be nice to improve the performances! |
|
@MagicStack guys please finish this CR. |
|
Updates on this? @fantix |
|
@fantix are you planning to review and release this soon? 🙏 |
|
@1st1 Is there any news about this PR ? |
|
Sorry about the delay! I'm back on this one now. |
fantix
left a comment
There was a problem hiding this comment.
Actually instead of replicating the fast-path code, maybe we can just relax the fast-path condition, like:
@@ -424,7 +424,7 @@ cdef class UVStream(UVBaseTransport):
cdef inline _initiate_write(self):
if (not self._protocol_paused and
(<uv.uv_stream_t*>self._handle).write_queue_size == 0 and
- self._buffer_size > self._high_water):
+ (self._buffer_size > self._high_water or len(self._buffer) == 1)):
# Fast-path. If:
# - the protocol isn't yet paused,
# - there is no data in libuv buffers for this stream,(kind of a follow-up of 46dd8f3)
I believe this would have the same performance boost as this PR, yet being compatible with the current code base.
|
Hi @fantix, The last one gives a decent boost, I would definitely suggest to merge it Also slightly irrelevant here, but any chance we can figure out why tests are sporadically failing? For my PRs I was just re-running tests 2-3 times until they pass :) |
|
Roger, I will get to them soon. I'm spotting at least 2 more flaky tests, I shall fix them first (or skip non-critical ones for now). |
Done.
|
| if used_buf: | ||
| PyBuffer_Release(&py_buf) |
There was a problem hiding this comment.
I added PyBuffer_Release for consistency, but I think that the outer check if blen == 0: is not really necessary.
We already filter out empty data objects twice. First in write(), second in _buffer_write().
We can probably remove the check in write() and leave check only in _buffer_write(). Both write and writelines rely on it.
I can do it, but I'd rather put it into a separate PR
Current implementation of UVStream.write almost never sends data immediately. Instead the data is stored in the buffer and picked up later by uv_check callback.
This introduces unnecessary latency and CPU overhead.
Despite being a change only in UVStream it also directly benefit _SSLProtocolTransport.write latency since it uses UVStream.write to send ssl frames.
This PR increases RPS rate between echoclient and echoserver by roughly 10%.
echoclient --worker 1 --num 200000echoserver --uvloop --protoBecause of this change a couple of test had to be tweaked.
Please note that the current implementation of asyncio does the same thing. It tries to write directly to the socket, and, only if EWOULDBLOCK happens, the data is added to the buffer
https://github.com/python/cpython/blob/c13e7d98fb8581014a225b900b1b88ccbfc28097/Lib/asyncio/selector_events.py#L1065
Apart from that:
I have relaxed fast-path condition further and also removed self._buffer_size > self._high_water in _exec_write. Seems to be working fine.
Added primitive return types to _try_write and _exec_write. Previously there were PyObject_RichCompare calls for values returned by _try_write.
Simplied return value meaning for _try_write. Now it just returns number of bytes written or -1 in case of fatal error.
Removed some signed/unsigned and Py_ssize_t -> int conversions in _exec_write and try_write. Just use Py_ssize_t without conversion when possible